Data driven multidialectal phone set for Spanish dialects

نویسندگان

  • Mónica Caballero
  • Asunción Moreno
  • Albino Nogueiras
چکیده

This paper addresses the use of a data-driven approach to determine a multidialectal phone set for an automatic speech recognition system for Spanish dialects. This approach is based on a decision tree clustering algorithm that tries to cluster contextual units of different dialects. This procedure avoids the definition of a global phonetic inventory and the previous study of similarity of sounds. The procedure is applied in Spanish as spoken in Spain, Colombia and Venezuela. Results show differences between phonemes that share the same SAMPA symbol in different dialects and also detect similarities between phonemes that are represented by different symbols in dialectal variants. Recognition results using this multidialectal approach overcome the monodialectal ones.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multidialectal Spanish acoustic modeling for speech recognition

During the last years, language resources for speech recognition have been collected for many languages and specifically, for global languages. One of the characteristics of global languages is their wide geographical dispersion, and consequently, their wide phonetic, lexical, and semantic dialectal variability. Even if the collected data is huge, it is difficult to represent dialectal variants...

متن کامل

Multidialectal Acoustic Modeling: a Comparative Study

In this paper, multidialectal acoustic modeling based on sharing data across dialects is addressed. A comparative study of different methods of combining data based on decision tree clustering algorithms is presented. Approaches evolved differ in the way of evaluating the similarity of sounds between dialects, and the decision tree structure applied. Proposed systems are tested with Spanish dia...

متن کامل

Multidialectal Spanish Modeling for ASR

This paper describes the latest advances in our ongoing work in the area of Spanish multidialectal speech recognition. This work deals with the suitability of using a single multidialectal acoustic modeling for all the Spanish variants spoken in Europe and Latin America. The objective is two fold. First, it allows to use all the available databases to jointly train and improve the same system. ...

متن کامل

A Multidialectal Parallel Corpus of Arabic

The daily spoken variety of Arabic is often termed the colloquial or dialect form of Arabic. There are many Arabic dialects across the Arab World and within other Arabic speaking communities. These dialects vary widely from region to region and to a lesser extent from city to city in each region. The dialects are not standardized, they are not taught, and they do not have official status. Howev...

متن کامل

Arabic Dialect Identification Using a Parallel Multidialectal Corpus

We present a study on sentence-level Arabic Dialect Identification using the newly developed Multidialectal Parallel Corpus of Arabic (MPCA) – the first experiments on such data. Using a set of surface features based on characters and words, we conduct three experiments with a linear Support Vector Machine classifier and a meta-classifier using stacked generalization – a method not previously a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004